perm filename CHPT.7[LSP,JRA]1 blob
sn#288892 filedate 1977-06-19 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00016 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .SEC(Storage#Structures and#Efficiency,,,P210:)
C00008 00003 .SS(Bit-tables,,P263:)
C00011 00004 .SS(Vectors and Arrays,,P137:)
C00021 00005 A typical implementation of an array facility in LISP would include
C00024 00006 .SS(Strings and Linear LISP,string processor,P27:)
C00035 00007 .SS(A Compacting Collector for LISP,,P291:)
C00048 00008 .SS(Representations of Complex Data Structures,,P138:)
C00057 00009 .SS(%3rplaca%* and %3rplacd%*,,P171:)
C00066 00010 .SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
C00079 00011 .GROUP
C00089 00012 .SS(Stacks and Threading)
C00096 00013 .SS(A Non-recursive %3read%*)
C00107 00014 .SS(More Applications of Threading)
C00110 00015 .SS(Storage Management and LISP,,P224:)
C00126 00016 .SS(Hash Techniques,,P248:)
C00132 ENDMK
C⊗;
.SEC(Storage#Structures and#Efficiency,,,P210:)
.SS(Introduction)
.FP
Though it is true that any algorithm can be coded in terms of
manipulations of binary trees, there are many instances in which more
efficient organizations of data exist.
At the most elementary level are vectors and arrays. These
data structures could certainly be stored in a list-structure format
and individual components accessed via %3car-cdr%* chains. However, most
machines have a hardware organization which can be exploited to
increase accessing efficiency over the list representation.
Similarly, strings can be represented as lists of characters. The
string processing operations are expressible as LISP algorithms. But
again this is usually not the most reasonable representation. Even at
the level of list-structure operations, simple binary trees might not
be the most expeditious representation for every problem. Also many
of the algorithms we have presented in LISP are overly wasteful of
computation time. This section will begin an examination of
alternatives to LISP organization.
There is no best data
representation, nor is there a best algorithm. The chosen representation
must match the machine and the problem domain being studied. If
the problem is strictly numerical, then list-structure is not the
answer; if simple string manipulation is sufficient,
then list processing is too general. And there are many applications
of list processing which are so sufficiently well#behaved that
complex devices like garbage collectors are unnecessary.
The point is that if you have seen the list-processing techniques in
a rich environment such as LISP and have seen the applications to
which LISP may be put, then you will be prepared to apply these
techniques in a meaningful way. Many times a quick
representation in LISP is all that is needed. You get a clean
representation with comprehensible algorithms. Once you've studied
the algorithms, efficiencies might come to mind. At that time either
recode the problem using some of the obscene LISP programming tricks
or drop into machine language if very fine control is required. Once
you have %2a%* representation it is easy to get better ones. For
example, once you have a compiler which is correct it is
easier to describe a smarter one.
This section will describe representations other than LISP binary trees
and will show some ways of increasing the efficiency of LISP programs.
.SS(Bit-tables,,P263:)
.FP
In the marking phase of a garbage collector it is
necessary to record the visitation of each word.
Frequently it is not possible to place a mark in the actual word.
This might occur for several reasons:
.SKIP 1
.NL;
%21.%1##For words in FS, there is no
room if each word contains exactly two addresses.
.NL;
%22.%1# The word is
in FWS and the meaning of the information stored there would be
changed if we set a bit.
.NL;
%23.%1# In structures, more complex than dotted pairs, there may not be room for
marking bits.
.NL;
%24.%1# If a bit is assigned to each word, then the initialize phase
requires that we visit each such word. This violates
"locality of reference".⊗↓Locality refers to the relative distance between
memory locations assigned in a particular structure. In some machine organizations,
memory is divided into "pages" of a relatively small size. There is significant
overhead involved in crossing page boundaries. Therefore memory referencing
which entails many scattered references is said to violate "locality of
reference."←
.pt24;
.FP
An alternative solution is to designate a separate section of memory called a
%2bit-table%1. The bit-table is a sequence of binary flags; and there
is a one-to-one correspondence between a flag and a markable location.
Whenever we wish to record the visiting of a word, we set the corresponding flag.
A bit table is represented as a sequence of machine locations with several
flags represented in each word. There is a simple calculation to relate a
bit in a word to its corresponding markable location.
It is frequently faster to initialize a whole
table rather than zero single bits in separate words.
.SS(Vectors and Arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
.FP
%2Vectors%*.##Vectors, also known as one-dimensional arrays, are usually
stored sequentially in memory. Simple vectors are usually stored one
element to a memory location though this is not a necessity. For
example, a complex vector is very naturally stored as pairs of cells;
or if perhaps you would allow vectors of nonhomogeneous data modes,
each element would have to include type information. In any case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made. There is a natural
simulation of a stack as a (sequential) vector with access
made via a global pointer to the vector.
.pt24;
.FP
%2Arrays%*.##Arrays are multi-dimensional data structures. Since
most machine memories are linear devices we must map arrays onto a
linear representation. We will restrict attention to the case of
two-dimensional arrays. Most of the discussion generalizes very
naturally. A very common device is to store the array by rows; that
is, each row is stored sequentially - first, row 1; then row 2,#...
and so on.
Given this representation there is a trivial calculation to find the
location of an arbitrary element, A[i;j], if we know the location of
the first element A[1;1] and the extent of the dimensions of the
array. For an array A[1:M; 1:N],
.EQ;
loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)
.pt18;
.FP
In languages like Fortran which require that the size of the array be
known at compile-time the compiler can generate the accessing code
without problem. Languages, like Algol 60 and some versions of LISP,
which allow the size of an array to be determined at run-time require
a bit more care. Algol 60 for example requires that you declare the
type (real, boolean, etc.) of the array and specify the number of
dimensions in the array, but you can postpone until run-time the
designation of the size of each dimension. To handle this complexity
a dope vector is introduced.
A %2⊗→dope vector↔←%* is typically a header or descriptor associated with
the area containing the actual array elements. The information in the dope vector
tells the functions which access the array how to treat the data.
Type and dimensionality are typical entries in dope vectors.
The compiler can determine the size of
the dope vector, but not the contents. The dope vector is filled in
when the array declaration is executed
and contains information about the actual extent of the
array bounds. Since the size of the array is not known, the
compiler cannot allocate space for the array elements. The
allocation must also be done at run-time. When the array declaration is
executed we must know all the information about the array. At that
time we allocate space on the stack and complete the dope vector. Subsequent
references to array elements use the dope vector. When we leave the
block which caused allocation of the array, the
stack space is reclaimed.
Assume that
the array elements are stored by rows. Look at the calculation of
the location of element A[i;j]. For specific execution of an array
declaration much of this information is constant; the location of
the array elements, in particular, A[1;1] and the number of columns
N are fixed.
.Group;
Thus we rewrite the calculation as:
.PT18;
\###constant part\variable part
\[loc [A[1;1]]-N-1]#######+\######(i*N+j)
.pt18;
The constant part is stored in the dope vector. When we wish to
reference an element A[i;j] we need only compute the variable part
and add it to the constant part.
.APART
The dope vector for A [1:M; 1:N] perhaps might contain
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
⊂ααααααααααααααα⊃
~ 2 ~
εααααααπααααααααλ
~ 1 ~ M ~
εααααααβααααααααλ
~ 1 ~ N ~
εαααααα∀ααααααααλ
~ constant part ~
εαααααααααααααααλ
~ . . . ~
array elements
~ . . . ~
%ααααααααααααααα$
.END
.BOXB
.FP
There is another scheme for storing arrays which is used in
some of the Burroughs machines#(⊗↑[Org#71]↑,#⊗↑[Dor#76]↑).
Each row is stored sequentially and
access to separate rows is made through a device called a
%2⊗→mother-vector↔←%*. The mother-vector is a vector of pointers to the
rows.
.GROUP;
Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "[" FOR "\";
.TURN OFF "\";
.TABS 57,61;
.KRK
⊂ααααααα⊃ ⊂αααααααααααααααααα [αααα[⊃
~ #ααβααα→~ A∞411∞* A∞412∞* ...[A∞41N∞*[~
εαααααααλ %αααααααααααααααααα [αααα[$
~ #ααβαα→ . . .
εαααααααλ
. . .
εαααααααλ ⊂αααααααααααααααααα [αααα[⊃
~ #ααβααα→~ A∞4M1∞* A∞4M2∞* ...[A∞4MN∞*[~
%ααααααα$ %αααααααααααααααααα [αααα[$
.END
.BOXB;
.APART
Notice that the accessing computation is very cheap.⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.← Another effect
is that all rows need not be in memory at once. On a paging or
segmenting machine
this array organization can be helpful. If an access to a row not in
core is made, a "page#fault" is raised; the monitor brings the row
into memory and the computation continues. The mother-vector scheme
generalizes nicely to multidimensionality and can be used in
conjunction with a dope vector.
.END
A typical implementation of an array facility in LISP would include
a declaration:
.DEF
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... ;<bounds>], where the
identifier names the array; the type could be numeric or S-expr; and finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
⊂ααααααπαααα⊃ ⊂ααααααααααααα⊃
~ SUBR ~ #αβ→αα→~ dope vector ~
%αααααα∀αααα$ ~ calculation ~
εαααααααααααααλ
~ array ~
~ elements ~
. . .
%ααααααααααααα$
.END
.BOXB
.FP
If we are to store S-exprs in the array, then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.
When an array element is to be referenced, the subscripts are evaluated
(since the array name was declared as a %3SUBR%*) and the dope vector code
is executed. That computation results in a reference to the appropriate cell.
.GROUP;
We also must be able to store information in the array.
.DEF
%3store[%1<name>[<subscr>; ... ;<subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.APART
.CENT(Problem)
.NL
1.##Implement a stack in LISP first using lists or dotted pairs, then using
an array. Include implementations of the stack operations.
.SS(Strings and Linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
.FP
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words, each containing part of the external
name for the atom. Print names are a special instance of a data structure
named strings, and our use so far in LISP of strings has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss components of a string-manipulating
language.
The
organization and implementation of a general string processor
directly parallel LISP. In fact a subset of LISP,
⊗→linear LISP↔←, is a nice notation for string algorithms.
A %2string%1 is a sequence of characters. A language
for string processing should allow the creation of strings
of arbitrary length at runtime; it should allow the generation of new
strings and the decomposition of existing strings. If strings of
arbitrary length are to be created, an organization similar to FS in
LISP can be used with, perhaps, a string garbage collector. We will
assume this most general case.
The machine memory will contain at least a ⊗→string space↔←, an
evaluator, a symbol table, and a garbage collector.
String space is a linear sequence of cells, each of which can
contain exactly one character. A string will be represented as a
sequence of contiguous character cells. A string variable will be an
entry in the symbol table; the current value of the variable will be
represented as a pair: character count and a pointer to the beginning
of the character sequence.
.GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.KRK
⊂αααπααα⊃
~ 3 ~ # ~
%ααα∀αβα$
~
%α→ααα→⊃
↓
. . .ααααπαααπαααπαααπαααπαααπααα . . .
A B B 1 D . . . string space
. . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.BOXB
.END
encodes the string %3ABB%*.
.APART
.BEGIN TURN OFF"←";
We must make some decisions about how we manipulate strings: when we
perform %3x ← y%*, do we copy the symbol table pair of %3y%* into that of %3x%*, or
do we make a new string isomorphic to %3y%* and point %3x%* at it? It makes
a difference. We will choose the former, copying only the
"descriptor," that is, we will share strings (and substrings)
wherever possible. This decision makes the storage requirements less
expensive, but will make our lives more difficult when we worry about
⊗→garbage collection↔←. There are three primitive functions: ⊗→%3first%*↔←,
⊗→%3rest%*↔←, ⊗→%3concat%*↔← (read: %3car%*, %3cdr%*, %*cons%*).
.END
.GROUP;
.DEF
%3first[x]%* is the first
character of the string represented by %3x%*. %3first%* is undefined for the
empty string,#%7e%1. For example,
.EQ
%3first[ABC]%1 is %3A; first[%7e%3]%1 = %9B%1
.APART
.GROUP;
.DEF
%3rest[x]%* is the string of characters which remains when the first
character of the string is deleted. %3rest%* is also undefined for the
empty string. For example,
.EQ
%3rest[ABC]%* is %3BC%*
.APART
.GROUP;
.DEF
%3concat[x;y]%* is a function of two arguments. %3x%* is a character; %3y%* is
a string. %3concat%* forms a string consisting of the concatenation of a
copy of %3x%* and a copy of the string %3y%*. For example,
.EQ
%3concat[A;BC]%* is %3ABC%*
.APART
.GROUP;
There are three string predicates:
.EQ
⊗→%3char%*↔←%3[x]%1: is %3x%* a single character?
.EQ1(24);
\⊗→%3null%*↔←%3[x]%1: is %3x%* the empty string?
\%3x ⊗→%3=%*↔← y%1: are %3x%* and %3y%* the same character?
.END
.APART
.GROUP;
For example:
.EQ1(24);
\%3char[A] %1is %et%1
\%3char[AB] %1is %ef%1
\%3AB = AB %1is %9B%1
.END
.pt18
.APART
.FP
The implementation of these string primitives is somewhat less complex
than that of LISP primitives.
%3first%* generates a character count of one and a pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.
%3concat%* is a bit more problematic. We copy %3x%* and copy %3y%*
so that %3y%1 follows %3x%1 in free string space; we
generate a character count of one plus the count of %3y%*, and
generate a pointer to the copy of %3x%*. The copies are
made in the free string space pointed to by the %2⊗→string space pointer↔←%*.
The obvious question of what to do when string space is exhausted
will be postponed for a moment. The implementations of the three
predicates are easy: we will blur the distinction between characters
and strings of length one. Thus %3char%* need only check the character count.
%3null%* gives %et%1 if the count is zero. What about#=#? Obviously characters
are not stored uniquely, so we must make an actual character
comparison.
The implementation of storage management in a string processor can be more
complex than a simple LISP implementation. We use a garbage collection
scheme which, in some ways,
is simpler and, in some ways, more complicated than a collector
of list-structure. The marking phase is much simpler since we use the
descriptor in the symbol table to mark the character string. Since we
are sharing substrings, we cannot stop the marking simply because we
have encountered a previously marked character, but at least
the marking is not recursive.
However, the collection
phase needs to be more sophisticated for string processors. Since
strings are stored linearly (or sequentially), a fragmented string
space list is of little use. Thus we must compact all the
referenceable strings into one end of string space, freeing a linear
block for the new free string space. Since we are sharing substrings,
a little care needs to be exercised. When we move a string,
obviously the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location. So before
we begin the relocation of strings, we sort the string descriptors on
the basis of their pointers into string space. We then recognize
each parent string, moving it down into freed locations and updating
the address pointers of any substrings. We continue this process.
Eventually all strings will be compacted down in string space. The
string space pointer will be set and the computation can continue.
Compacting garbage collectors can be adapted for use in LISP or more
general types of data structures.
.END
.SS(A Compacting Collector for LISP,,P291:)
.P173:
.FP
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the collection phase of string
garbage collector and
produce a compacting garbage collector for LISP.
The intention is to move all active structures to contiguous locations at
one end of the appropriate space.
There are several motivations for compacting storage. First, besides making the
%2active%* storage contiguous, we also make the %2free%* locations contiguous.
Thus the free lists can be handled as arrays rather than as lists. To get the next
free element, take the next element in the free array.
The second reason for considering compaction comes up in languages other
than LISP.⊗↓As we shall soon see, the rationale is
applicable in LISP as well.← If the language allocates storage in a manner similar to LISP
but the constructs allow %6different%*-sized blocks to be specified
(a string processor is a simple example), then
compaction is a strong contender. To subdue the fragmentation of memory
we could consider compacting all free locations to one end of memory.
More general schemes are also viable. We will talk about these possibilities
when we discuss dynamic allocation.
Another reason for
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this would obviously increase machine
overhead.⊗↓Very little empirical work has been done on the actual storage
requirements and running environment of LISP.
A start is made in ⊗↑[Cl#76]↑; much more should be done.←
However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.
So granted that compaction is a worthwhile endeavor, how do we proceed?
Clearly we can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
⊂αααπααα⊃ ? ⊂αααπααα⊃
204 ~204~204~∞1 is not the same as ∞b ?100 ~204~204~
%ααα∀ααα$ ? %ααα∀ααα$
.BOXB
.END
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
⊂αααπααα⊃ ? ⊂αααπααα⊃
204 ~204~204~∞2 is∞1 the same as ∞b ?100 ~100~100~
%ααα∀ααα$ ? %ααα∀ααα$
.BOXB
.END
.FP
To handle these problems, we expand the sweep phase into two phases:
the %2relocating%* phase and the %2updating%* phase. As with the sweep phase,
there are relocating and updating phases for %6each%* space.
The relocating phase begins after all active structure is marked.
Assume we are to compact all active structure %6down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a marked location;
we move the free pointer up until we locate an %6unmarked%* cell.
We want to move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move may reference other cells
which will be moved.
.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.BOXA
. . .
⊂αααπααα⊃
77 ~ 65~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
100 ~ ~ ~ ←∞1free pointer∞*
%ααα∀ααα$
. . .
⊂αααπααα⊃
155 ~ ~ ~
%ααα∀ααα$
. . .
⊂αααπααα⊃
204 ~402~ 77~
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~204~402~ ←∞1active pointer∞*
%ααα∀ααα$
. . .
.BOXB
.END
.END
.FP
Cell %b77%* was active so we left it alone; it references cell %b65%*,
which has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where the contents has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃
100 ~204~402~ ←∞1free pointer∞*
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~ 100~ ←∞1active pointer∞*
%ααα∀ααα$
.BOXB
.END
.END
.FP
The active pointer, having writ, moves on;
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %2col%*.
When they meet, all locations above %2col%* either will be free or will
contain forwarding addresses. All addresses, %2col%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.
.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.boxa
. . .
⊂αααπααα⊃
77 ~ 65~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
100 ~204~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
155 ~402~ 77~
%ααα∀ααα$
. . . ← ∞2col∞*
⊂αααπααα⊃
204 ~ ~155~
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~ ~100~
%ααα∀ααα$
. . .
.boxb
.END
.END
.FP
We examine the initial segment of our space from the bottom to %2col%*
looking for any references to that area %6above%* %2col%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
What to do is obvious: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't
change. Cell %b100%* refers to two relocated cells; we find their forwarding
addresses, and cell %b100%* becomes
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃
100 ~155~100~
%ααα∀ααα$
.BOXB
.END
.FP
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cells below %2col%* are updated, the garbage collection is finished.
The cells above %2col%* are all available for the free-list.
.GROUP;
.CENT(Problems)
.NL
1.##Is %2col%* in the free space list after the update phase?
.NL
2.##Write a LISP algorithm for compacting garbage collection in LISP.
.APART;
.SS(Representations of Complex Data Structures,,P138:)
.FP
In our discussion of abstract context-free data structures in {yonss(P287)},
we isolated three kinds of structures:
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 ... %eD%41%1
e.g.,←<seq> ::= %2(%*<seq elem>%2,%* ..., <seq elem>%2)%*
.END
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 | %eD%42%1 | %eD%43%1
e.g.,←<seq elem> ::= <indiv> | <seq>
.END
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 %eD%42%1 %eD%43%1... %eD%4n%1
e.g.,←<sexpr> ::= %3(%*<sexpr> . <sexpr>%3)%*
.END
.PT18;
.FP
We have discussed the behaviorial characteristics of algorithms
which operate on these structures. Now we wish to examine
the storage structure aspects of these data structures.
Corresponding to these three
data structures are three "natural" storage representations.
By "natural" we mean that even though all these structures
can be represented as LISP S-expressions, for example, there
are representations which might better suit the operations
which we expect to perform on those structures.
Since "natural" is not a well-defined term, we will
clarify its meaning using examples of context-free data structures.
The first type of data structure given above, maps naturally onto
a representation which contains information that the object is of
type %eD%1 and contains space for the storage instance of this data type.
Elements of type %eD%1 are homogeneous, being all of type %eD%41%1;
however, the size of a type %eD%1 element is indefinite.
Depending on the operations which are to be performed on the representation,
either a list representation or an array representation is reasonable
for the storage structure. Unless the operations are quite
complex, a sequential allocation scheme suffices.
The second type of data structure is frequently represented as a pointer.
There really isn't any storage allocated for objects of this type.
Instances which satisfy this equation have their storage
requirements set by one of the %eD%4i%1 alternatives.
We will discuss pointer manipulation in LISP in the next section.
This section will discuss the third abstract data structure.
The essential characteristic here is that instances of this structure
have a fixed number of components, and the types of
those components need not be homogeneous.
Those components are typically referenced by name.
These characteristics form a natural distinction between
this third class and the first class, even though an appropriate encoding
would make it possible to represent either class in the other.
.GROUP;
For example, in equations like
.BEGIN CENTERIT;
.PT18;
←<sexpr> ::= %3(%1<sexpr> . <sexpr>%3)%1
.ONCE INDENT 0;
or ←<form> ::= <function>[<arg-list>]
.END
.PT18;
.FP
we reference components by selectors like %3car%1, %3cdr%1, %3func%1,
and %3arglist%1.
.APART;
LISP represents instances of the above equations
as objects of the first and second types of data structure:
variable-length lists of pointers.
As a result, we have thought of these selectors as operations
which might require some nontrivial amount of computation
to discover the desired component, but as we saw in
{yonss(P290)} what is algorithm and what is data depends on your point of
view. For example, we could think of a dotted pair as an array which has
two components, one referenced by %3car%1, one referenced by %3cdr%1.
We say "array," since the number of components is known; but the
element references are done by nonnumerical names.
The natural storage requirements for such objects imply a fixed amount
of storage. That storage can be sequentially allocated since the
size of the element will not vary. The representation must also
encode the scheme for associating external selector with internal
representation.
.GROUP
For example,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK;
.BOXA
⊂αααααπααααα⊃
~ CAR ~ ~
εαααααβαααααλ
~ CDR ~ ~
%ααααα∀ααααα$
.BOXB
.END
.APART
.FP
Notice that the array-referencing mechanisms have to solve a similar
problem. However, array representation is such that the dope vector
can perform a %2calculation%1 to locate the element.
The storage element which we have been developing is called
a %2⊗→record↔←%1 (⊗↑[Pop#68]↑), or a %2structure%1 (⊗↑[Alg#75]↑,
⊗↑[EL1#74]↑), or a %2plex%1 (⊗↑[Han#69]↑).⊗↓A similar device, called a
%2hunk%1, has been implemented
in MacLISP ⊗↑[Ste#pc]↑.←
.BEGIN TURN OFF "←";
Besides the usual constructors, selectors and recognizers, records
are typically supplied with a function
to modify components of a structure. This function is called an
%2updater%1.
Just as we can write %3A[43]#←#56%1 where %3A%1 is an array, an
updater function would implement a statement like %3car[x]#←#(A#.#B)%1.
.END
Updating of simple variables is called assignment.
A discussion of "updating" of more general
data structures requires a deeper understanding of
the implementation and storage structures. In the case of
LISP, it requires a discussion of
the modification of pointers. That is the topic of the next
section.
.SS(%3rplaca%* and %3rplacd%*,,P171:)
.BEGIN TURN ON "←";
.FP
The discussion in {yonsec(P188)} developed an implementation of the
LISP operations in terms of the manipulation of pointers.
Those manipulations allowed the
creation of new structure or allowed
sharing of an existing structure. None of these operations involved the
modification of an existing structure.
In this section we will discuss
some LISP coding
tricks which do involve modification operations.
.BEGIN GROUP;
First, consider
%3←append <= λ[[x;y][null[x] → y; %et%3 → concat[first[x];append[rest[x];y]]]]%1
.END
.FP
This function copies %3x%* onto the front of %3y%*. Note: %3y%* is not copied.
Or recall the %3subst%* function: it generates a copy with the correct
substitutions made. LISP operation must make copies in general, otherwise
some very
unsavory side effects can happen.
Consider the expression %3append[(A B C);(D E)]%*. It appears that we
could get the effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator, and then we could replace
that terminator with a pointer to the list %3(D E)%*. Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ A ~ #αβα→~ B ~ #αβα→~ C ~≤'. . .→~ D ~ #αβα→~ E ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
The resulting structure does indeed look like one we would have
obtained from %3append%1. The operation we performed
%6modified%1 the existing structure, instead of %6copying%1 it as %3append%1
would have done. Such modifications can cause serious difficulties.
.GROUP;
Let us call the modifying version of %3append%1, %3nconc%1; and consider the
execution of the following sequence of statements:
.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;
%1first%3\i ← (A B C)
%1then%3 \j ← (D E)
%1and finally,%3\k ← nconc[i;j]
.pt18
.END
.APART
.FP
After execution of the third statement, %3k%1 would have value %3(A#B#C#D#E)%1.
The difficulty is that %3i%1 would %6also%1 have this value since we
modified the structure assigned to %3i%1.
Also, any value
which was sharing part of the structure of %3i%* will also be changed.
Language features which surreptitiously change values are evil. Sometimes,
however, it is quite useful to be evil.
The modification procedures, such as %3nconc%1, can be defined
in terms of two obscene
functions: %3rplaca%* and %3rplacd%1.
.BEGIN GROUP;
The procedure ⊗→%3rplaca%*↔←%3[x;y]%* replaces the %3car%*-part of %3x%* with %3y%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα###→
εαααπαααλ ↓ εαααπαααλ
# # # ←$ ~ x ~ # βα→α⊃
~ ~ β→ - - →⊃ εαααβαααλ ~
εαααβαααλ ~ y ~ #αβ→⊃ ↓
~ ~ ~ ↓ %ααα∀ααα$ ↓ ~
# # # ⊂αααπααα⊃ ⊂ααααααα⊃ ~ ↓
εαααβαααλ ~ # ~ ? ~ ⊂ - - →~ ? ~←$ ~
~ ~ ~ ⊂→%α|α∀ααα$ | %ααααααα$ ~
%ααα∀ααα$ ↑ % - → - →$ ↓
%←ααα←ααααα←αααα←ααααα←ααααα←ααα$
.BOXB
.END
.LE(6,Algorithm for %3rplaca%1)
.END
.FP
The second function is
⊗→%3rplacd%*↔←%3[x;y]%* meaning replace the %3cdr%*-part of %3x%* with %3y%*.
The AMBIT/G description of this function was given on {yon(P246)}.
.BEGIN GROUP
Now %3nconc%* can be defined as⊗↓Since we're really involved with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.←
.BEGIN SELECT 3;TABIT2(16,19);TURN OFF"←";
.KRK
.BOXA
nconc <= λ[[x;y] prog[[z]
\\[null[x] → return[y]];
\\z ← x;
\a\[null[cdr[z]] → rplacd[z;y]; return [x]];
\\z ←cdr [z];
\\go[a] ]]
.BOXB
.END
.END
.BEGIN SELECT 3;TABIT2(23,30);TURN OFF"←";GROUP;
.KRK
.BOXA
%1Consider:%*\prog[[x]\x ← (NOTHING CAN GO WRONG);
\\rplacd[cdddr[x];cddr[x]];
\\print[x]]
.BOXB
.END
.FP
We can use %3rplacd%* to generate circular lists.
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In general, to circularize a nonempty
list, %3x%*, %3rplacd[last[x];x]%* suffices where
←%3last <=λ[[x][null[cdr[x]] → x; %et%3 → last[cdr[x]]]]%1
Functions which modify list structure must be used with extreme caution.
They are not
recommended for amateur hacking. They are introduced here simply to
show that it is possible to improve on the efficiency of pure
algorithms by resorting to these coding tricks.
.BEGIN GROUP
←%2Problems%*
.NL
1.##What is the effect of evaluating %3rplacd[x;cdr[x]]%*?
.NL
2.##Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?
.NL
3.##It has been pointed out that %3rplaca%1 and %3rplacd%1 are
closely related to assignment statements ⊗↑[And#76]↑.
Extend one of our evaluators to recognize expressions like
.BEGIN CENTER;TURN OFF "←";
%3car[%1<form>%3] ← %1<form>
.END
.ONCE INDENT 4;
as abbreviations for:
.BEGIN CENTER;SELECT 3;
rplaca[%1<form>; <form>%3]%1
.END
.ONCE INDENT 4,4;
This extension of assignment is obviously not restricted to
%3rplaca%1 but could allow arbitrary forms on the left-hand
side of an assignment.
.END
.END
.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the problem of inserting
an element into the middle of a list. For example let %3x%* be the list
%3(A B C)%*. If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform
.pt18
.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]].
.END
.PT18
.FP
In general, we have little choice but to recopy the initial segment
of %3x%*, adding %3D%* into the appropriate place. A similar technique is
obviously available to delete a specified element from the interior of a list.
Careful use of %3rplacd%* can, in some instances, replace the heavy use of %3cons%*.
%3cons%* always carries with it the threat of a call on the garbage collector.
For example, given the list %3(A B C)%* with pointers %3x%* and %3y%* into it
as follows:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12
.BOXA
x y
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
we could insert the element %3D%* %6after%* the first element in %3y%* by
←%3rplacd[y;cons[D;cdr[y]]]%*, giving⊗↓Notice
that %6one%* application of %3cons%* is unavoidable.←:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
x y
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃ ⊂→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ ↓ ↑ %ααα∀αα$
↓ ~
⊂αααπααα⊃ ↑
~ D ~ #αβα$
%ααα∀ααα$
.BOXB
.END
.FP
But as always, be warned that the value of %3x%* has also been changed; and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.
We can also use %3rplacd%* to delete not the %2first%*, but the next element
in %3y%* by
.EQ
%3rplacd[y;cddr[y]]%*
.PT18;
.FP
Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr %3z%* use
.EQ
%3rplaca[y;z]%*
.PT18;
.FP
Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %6after%* and delete %6after%*, rather than insert %6at%* or delete %6at%*.
If you look at a diagram you will see why.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12;
.BOXA
x
~
↓ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell; a similar remark applies to insertion at a specified cell.
A simple, perhaps inefficient scheme, to support such modification
would be to start a second pointer
from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.
If these "modification-%6at%*" functions were to be performed very frequently,
then it might be worth starting %6two%* pointers down the list, one at %3x%*,
one, say %3y%*, at %3cdr[x]%*, as above. Then testing could be done using %3y%*
and the modification could be done using %3x%*. When we move %3y%* to
%3cdr[y]%*, we move %3x%* to %3cdr[x]%*. If we wanted to modify
%6before%* rather than %6at%*, we could proliferate the "back pointers," but
if this kind of generality is required a change of representation
is called for. We might resort to the double-linking scheme introduced on
{yon(P164)}; more complex representations are also discussed
in detail in ⊗↑[Knu#68]↑ Chapter 2.
A LISP
implementation which stores p-lists as list structure would
use %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists would use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.
.GROUP;
.DEF
%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present, then we will simply change its
value;
if the indicator is not present, then we will add the indicator-value
pair to the front of the p-list.
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is
the value to be stored.
.BEGIN SELECT 3;TABIT2(17,20);GROUP; TURN OFF"←";
.KRK
.BOXA
putprop <= λ[[n;v;i] prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]];
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]];
\\go[a] ]]
.BOXB
.END
.FP
Note that extended conditional expressions are used in the definition.
.APART
.GROUP;
.DEF
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate used to remove attribute-value pairs from
the property list of an atom.
We will capitalize on the LISP "%3NIL%1-non#%3NIL%1" trick
for predicates and return the removed property value if one is found.
The following implementation of %3remprop%1 does that.
.BEGIN SELECT 3;TABIT2(15,18);GROUP;TURN OFF"←";
.KRK
.BOXA
remprop <= λ[[n;i] prog[[m]
\\m ← n;
\a\[eq[cadr[m];i] → rplacd[m;cdddr[m]];return[caddr[m]]];
\\m ← cddr[m];
\\[null[cdr[m]] → return[%ef%3]]
\\go[a]]]
.BOXB
.END
.APART
.FP
Applications of %3rplacd%* will occur inside ⊗→%3ratom%*↔← and
on {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency.
Pointer modification is also used
inside the ⊗→garbage collector↔←. Recall the %3sweep%1 phase of the collector
on {yon(P257)}.
.P161:
Finally, one of LISP's less illustrious uses of pointer modification
is to "allow" the construction of self-modifying programs.
This technique is similar to the machine language tricks of self-modifying code
and should be used with similar frequency.
The freedom to hang yourself should not be construed as an invitation
to do so, but
it again points out the similarities of LISP to machine language and
highlights the differences between LISP and its contemporary high-level
languages.
LISP's central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could conceivably modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %6own%* structure.
.BEGIN TABIT2(14,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;
.KRK;BOXA
foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]];
\a\print[x];
\\rplaca[rest[y];z←add1[z]];
\\go[a] ]]
.BOXB
.END
.FP
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print
%33, 4,%* and so on.
There really isn't much that can be said about such a program.
.CENT(Problems)
.NL
1.##More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.
.NL
2.##Recall the function which reverses the top-level elements of a list.
On {yon(P16)} and {yon(P23)}
we wrote it in various styles and with varying degrees of efficiency.
All these functions used %3cons%*; however, it is clear that we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%*, don't use any %3prog%*-variables.
.END
.GROUP;
.SS(Numbers,numbers)
.FP
In most implementations of LISP, numbers are stored as very simple kinds of
atoms: they do not need print names, but since we should probably
allow fixed- and floating-point representation, we do need indicators
for these properties. Thus
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 8,8;
fixed-point 1
~
↓
~ ⊂ααααααααπααα⊃ ⊂ααααα⊃
%→ ~ FIXNUM ~ #αβαα→~ 1 ~
%αααααααα∀ααα$ %ααααα$
floating-point 1
~
↓
~ ⊂ααααααααπααα⊃ ⊂ααααααααααααααααααααααα⊃
%→ ~ FLONUM ~ #αβαα→~ <machine rep of 1.0> ~
%αααααααα∀ααα$ %ααααααααααααααααααααααα$
.BOXB
.END
.APART;
.FP
In this representation, the number is stored in FWS and the
type indicator is indicated by a minimal property list.
This representation
is a bit expensive in space and can be expensive to manipulate with arithmetic
operators. Several tricks have been used to improve arithmetic in LISP.
Assume that the addressing
space of the machine is 2%818%* and that the usual size of a LISP
core image is significantly smaller, say, N. Then all memory address
references greater than N are illegal (and trapped by the monitor).
What we will do is use these illegal addresses to encode some of the
smaller positive and negative integers, mapping zero on the middle
address, the positive numbers to lower addresses and the negatives
onto the higher addresses. Thus these smaller integers, called
%2INUMS%*, are
represented by pointers outside of the normal LISP addressing space.
This trick can considerably decrease the storage requirements for
jobs heavily using small numbers. (Numbers are not usually stored
uniquely.)
.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 28;
.BOXA
⊂αααααααααα⊃ 2∞818∞*
~?~
~?~
~?~ m ∞1<∞* 0
~?~
~?~ m = 0
~?~
~?~ m ∞1>∞* 0
~?~
~?~
εααααααααααλ N
~?~
~?~
~?~
~?~
~?~
~?~
~?~
~?~
%αααααααααα$ 0
.BOXB
.END
.LE(6,Picture of INUM Space)
.APART
.FP
The INUM representation takes less space but adds an additional complexity:
now the arithmetic operators have to deal with three different kinds of numbers.
The MacLISP (⊗↑[Moo#74]↑) implementation uses a different representation for
numbers.⊗↓Much care went into the MacLISP number
handling since it is used as the implementation language for MACSYMA
(⊗↑[MAC#74]↑,#⊗↑[Wan#75]↑,#⊗↑[Mos#74]↑),
a large algebraic and symbolic mathematics system. MacLISP's efficient
number facilities, coupled with its optimizing compiler,
have resulted in the production of compiled code which is more efficient
than that produced by DEC's FORTRAN compiler (⊗↑[Fat#73]↑).←
In that implementation, two spaces are allocated for number storage:
FIXNUM space and FLONUM space. This makes a more compact representation since
the type information is implied in the address of the object rather than
being explicitly
stored. To those basic spaces we add two temporary stack areas: FIXPDL and
FLOPDL. These areas are used for temporary arithmetic computation.
The temporary areas work in conjunction with a type declaration option
available in MacLISP.
If we know that certain variables are %6always%1
going to be used as numbers in a particular function, then we can compile
better code. Assume %3x%1 and %3y%1 are to be used %6only%1 as FIXNUMs
within a function %3f%1; we would make such declarations for the LISP compiler
just as we can declare some variables as "special" to LISP compilers.
When we allocate space for %3x%1 and %3y%1, we allocate space on the
top of FIXPDL. Within %3f%1 the arithmetic operations use the hardware
arithmetic and reference the stack elements. The stack elements can be
passed to other arithmetic functions called within %3f%1, and no permanent
storage need be allocated in FIXNUM space until later.
The efficiency of arithmetic operations is dependent on the existence
of special hardware instructions for such arithmetic.
However, special hardware also places limits on the arithmetic
capabilities of most languages: arithmetic
is limited by the word size. LISP is able to overcome such limitations.
Most numerically oriented programs are faced at some time with
overflow. That is, they attempt to construct a number which is too
large to be represented in one machine location. There are now
several versions of LISP which will automatically change
representation when faced with overflow.
This scheme is called %2arbitrary precision arithmetic%1 and
has been implemented for both fixed-point and floating-point
numbers. We will describe a representation for fixed-point numbers
called %2BIGNUMS%1; they could have the following structure:
.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 10,10;
positive big number
~
↓
~ ⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→ ~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↓ ↓
N∞40∞* N∞4n∞*
negative big number
~
↓
~ ⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→ ~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↓ ↓
N∞40∞* N∞4n∞*
.BOXB
.END
.LE(8,Structure of a BIGNUM);
.APART
.BEGIN GROUP;
.FP
The value of a BIGNUM is given by:
.EQ
%7b%1(N%40%1 + %7a%1N%41%*+ ... + %7a%8n%1N%4n%1)
.PT18;
.FP
where %7b%1 is either + or - and
%7a%1-1 is the largest number representable in one machine word.
The translations BIGNUMS and the other numeric representations
are done automatically within the evaluator.
On most implementations of LISP, no attempt is made to store numbers uniquely.
Thus %3eq%1 will not work on numbers other than INUMs; either
%3equal%1 is extended for numbers
or a special equality predicate for numbers is provided.
.END
.SS(Stacks and Threading)
.FP
Though recursive descriptions of algorithms are usually more illuminating than
their typical machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of contemporary
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.
Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using an explicit stack:
.BEGIN TABIT3(6,12,27);SELECT 3;GROUP;TURN OFF "←";
.boxa
.krk
mark <= λ[[tr]prog[[st]
\loop\[is_marked[tr] → go[chk_st];
\\ is_full_wd[tr]→ markA[tr];go[chk_st];
\\ is_free_wd[tr]→\st←push[cdr[tr];st];
\\\markA[tr];
\\\tr←car[tr];go[loop]];
\\ %Et%* → go[chk_st]]; ⊗↓%1This branch of the conditional could be omitted
and the effect would be the same.←%3
\chk_st\[null[st] → return[%et%*]];
\\tr←top[st];
\\st←pop[st];
\\go[loop] ]]
.boxb
.APART;
.group;
push <= λ[[i;st] concat[i;st]]
top <= λ[[st] first[st]]
pop <= λ[[st] rest[st]]
.boxb
.END
.FP
Notice that we save only the %3cdr%*-node in the stack; even at that the
stack grows proportionally to the depth of the tree being traversed.
See the problem on {yon(P162)}.
The technique of using an explicit stack sometimes is more intuitive and
sometimes will lead to faster execution.
The second technique is more tricky but will lead to significant pay-offs
in execution time.⊗↓But there will be a proportional loss in clarity
in the code←
The technique is called %2⊗→threading↔←%* .
The basis for threading is a desire to traverse tree structures in a more
efficient fashion than that typically available in recursion or via
stacks. Recall that on {yon(P164)} we surmised
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
easily. However the extra link gives us no help if we wish
to descend into the substructure. It is this area to which threading addresses
itself: descent into tree structure, and in fact, nonrecursive tree traversal.
Threading comes in various flavors; we will now discuss a few.
Look at the new %3mark%*-er above; notice that for a
%6fixed%* tree and a %6fixed%* order of traversal,
that any two applications of marking will
have the same pattern of behavior. The order of visitation to each node
will obviously be the same, but the dynamic changes in the state of the stack
will %6also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
This technique of hiding the control structure in the data structure is called
threading.
The general application of threading is of questionable utility. It typically
requires a more complex data structure: we must be able to store both
threads and links. The traversing algorithms also become more complex:
we obviously must be able to recognize the difference between
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See ⊗↑[Knu#68]↑ for a complete discussion
of the techniques and tricks.
We are not about to complicate the LISP structures, but dispensing with a
stack, be it implicit or explict, does have some merits. What we can
do in LISP is strike a compromise. Instead of storing the threads permanently
in the structure, there are significant applications of threading
where we can %6temporarily%* store threads as we traverse trees.
The first application is in the design of a nonrecursive %3read%* program.
The second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problem)
.P162:
.NL
1.##With a little
more testing before stacking we can significantly cut down the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well mark them at that time and
proceed without doing a stack operation. Only when both branches are "nonatomic"
do we need stack the %3cdr%*. Write such an algorithm.
.SS(A Non-recursive %3read%*)
.P160:
.FP
The original %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.
However, now that we understand what the run-time behavior of such a recursive
program is, we can see that %3read%* and friends are a drain on %6two%*
resources: they
use free-space to %3cons%*-up the S-expr representation of the input;
they use the run-time stack for handling the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain on the free lists,⊗↓We
probably will be drawing on the full word area for print name
storage as well as on the free space area for
list structure storage.← but we %6can%* dispense with the run-time
stack by threading.
We can in fact do so without a proportional increase in the use of cells in
free space; indeed we need only %6one%* additional free cell, regardless
of the complexity of the input! This is truly a win for efficiency; it will be
a big loss for clarity. But again that's why this section on storage and efficiency
is where it is; we have an understanding of the purpose and intent of %3read%*;
only after the basic algorithm is well understood should we attempt to be
clever and efficient.
First we describe the basic ideas of the algorithm, then we give the algorithm.
The main idea in the algorithm is the realization that we really can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example consider
the input string "%3(A#(B#C)#D)%*". As we start our left-to-right scan of the
input,
we see "%3(%*". This immediately tells us that we need at least one %3cons%*.
We read "%3A%*"; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed," but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the developing representation. The
"%3B%*" and "%3C%*" add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The "%3D%*" goes on that
level and the final closing parenthesis completes the input.
All this is old but the difference now is that we don't use recursion or an explicit
stack to keep track of where we are. We keep a thread in the %3cdr%*-part of
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup.
There are three basic states in the reader:
.group;
.NL
%21.%*##The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.
.NL
%22.%*##The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.
.NL
%23.%*##The other main state occurs on reading a dot when in %3tail%*#state.⊗↓Dots
seen in any other context are errors.← In dot#state we check the next input;
if it's an atom we stuff it on the thread and go up. If it's a left parenthesis
we add a new cell and go down.
.PT24
.APART
There are some anomalies in the algorithm due to the desire
to recognize both S-expr notation
and list notation. The main malefactor is a count of parenthesis; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.
The final difference between the old parser and the new one involves
the scanner %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. This is not too strenuous an assumption. If the scanner
sees an open parenthesis, it looks ahead to the next meaningful
character.⊗↓It ignores
spaces and the like.← If the character is a closing parenthesis,
the scanner takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.
With this introduction, here is the new %3read%*:
.BEGIN SELECT 3; TABIT3(6,16,27);TURN OFF "←";GROUP;
.KRK
.BOXA
read <= λ[[]prog[[j;cp;count;top;temp]
\count←init[]; cp←count; top←cp;
head\j←ratom[];
\[or[is_dot[j];is_rpar[j]] → err[];
\ is_lpar[j] →\incr[count];
\\cp←down[cp];
\\go[head];
\ atom[j] → stuff[cp;j]; go[ckend]];
tail\j←ratom[];
\[atom[j] → cp←insert_move[cp;j]; go[ckend];
\ is_rpar[j] →\decr[count];
\\[eq[top;cp] → go[ck1];
\\ %et%* → cp←stuff_up[cp;NIL]; go[ckend]];
\is_lpar[j] →\incr[count];
\\cp←down[insert_move[cp;NIL]];
\\go[head];
\is_dot[j] →\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[];
\\ is_lpar[j] →\incr[count];
\\\cp←insert_move[cp;NIL];
\\\go[head];
\\ atom[j] →\cp←stuff_up[cp;j];
\\\go[ckend]]]; ⊗↓%1This %3go%1 is superfluous, but makes the flow more apparent.←%3
ckend\[eq[cp;top] → go[ck1];
\ %et%* → go[tail]];
ck1\temp← cnt[top];
end2\[zerop[temp] → return[exp[top]];
\j←ratom[];
\[is_rpar[j] → temp←sub1[temp]; go[end2];
\ %et%* → err[] ]]]
.BOXB
.END
.BEGIN SELECT 3;GROUP;TURN OFF "←";TABIT1(22);
.KRK
.BOXA
init <= λ[[] cons[NIL;0]]
stuff <= λ[[x;y] rplaca[x;y]]
incr <= λ[[z] rplacd[z;add1[cdr[z]]]]
insert_move <= λ[[cp;val] rplacd[cp;cons[val;cdr[cp]]]; cdr[cp]]
down <= λ[[cp] rplaca[cp;cons[NIL;cp]];car[cp]]
stuff_up <= λ[[cp;j] prog[[temp]
\temp ← cdr[cp];
\rplacd[cp;j];
\return[temp]]]
cnt <= λ[[x] cdr[x]]
exp <= λ[[x] car[x]]
.BOXB
.END
.FP
The development and understanding of this algorithm required most of what we
have covered in the course.
We used our knowledge of the parser, %3read%*; we used our familiarity with
S-exprs stored as linked lists; we have to understand the run-time control
of recursive calling sequences; we had to understand pointer manipulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were able to apply threading at a level higher than a "once#only" trick.
.CENT(Problem)
.NL
%21.%*##Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.
.SS(More Applications of Threading)
.Cent(A Link-Bending Garbage Collector for LISP)
.P144:
.FP
The use of a stack is one of the difficulties associated with garbage
collection. Garbage collection is invoked when available space has become
exhausted, but here we are asking for %6more%* space to use for stacking.
The usual solution
to such a problem is to allocate a separate area for stack storage. This
has its drawbacks. If we don't allocate enough stack space,- i.e., if the
depth of a piece of structure becomes too great, then the marker will fail.
The amount of stack space can become large - proportional to the
depth of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Several versions of such threaded collectors are available; see
⊗↑[Chr#68]↑ for a version
written in AMBIT/G; a more traditional description is found in
⊗↑[Sch#67]↑;⊗↓The correctness of [Sch#67] has been proved by de#Roever.←
and see ⊗↑[Knu#68]↑ for several alternatives.
.Cent(Binding Implementations)
.FP
Threading can be used in the shallow binder described in {yonss(P228)} to remember
the path through the environment tree#(⊗↑[Urm#76]↑).
We thread from E%4bind%1 to E%4inter%1
when we are looking for E%4inter%1.
This consists of reversing the access links as we proceed toward E%4inter%1.
Then, as we swap back the value cells, we will unthread from
E%4inter%1 to E%4bind%1.
.SS(Storage Management and LISP,,P224:)
.FP
There are two basic areas of LISP which require attention:
the implementation of data stuctures, and the implementation of the
LISP machine. We will discuss applications in that order.
LISP's most general data object is a dotted pair; however, we frequently
are found to be using dotted pairs to represent more structured objects.
For example, many common LISP programs involve list#operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list. We would like to capitalize on this more
efficient representation without jeopardizing the LISP operations.
An analysis of the LISP operations shows that we have to be able to %6share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we have to be able to
%6modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
Consider the typical representation of the sequence:
.BEGIN CENTER;SELECT 3;
(LAMBDA (X) (F X Y))
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
~ ↓
⊂αααααααααα$ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
↓ ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
⊂αααπαα⊃ %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
~ X ~≤'~
%ααα∀αα$
.BOXB
.END
.FP
This takes seven words. The %3car%*-part of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store.
Using some extra encoding we can carry the same information in
seven cells. ⊗↓These cells would need to be slightly larger.←
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.boxa
.indent 15,15
⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ ⊂αααααααα⊃
~↓ #αβαααα→~/ X ~
εαααααααααλ %αααααααα$
~/ #αβα⊃
%ααααααααα$ ↓ ⊂αααααααα⊃
%αα→~↓ F ~
εααααααααλ
~↓ X ~
εααααααααλ
~/ Y ~
%αααααααα$
.boxb
.END
.FP
The intent of the special characters is to encode information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
The %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is
%3NIL%*.
The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell is a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααα⊃ ⊂ααααα⊃
~↓ A ~ ~→ A ~
εαααααλ εαααααλ ⊂ααααα⊃
~/ B ~ ~* #αβααα→~/ B ~
%ααααα$ %ααααα$ %ααααα$
⊂ααααα⊃ ⊂αααααπααααα⊃ ⊂αααααπααααα⊃
~→ A ~ ~ A ~ #αβααα→~ B ~ ≤' ~
εαααααλ ⊂ααααα⊃ %ααααα∀ααααα$ %ααααα∀ααααα$
~* #αβαα→~→ B ~
%ααααα$ εαααααλ
~* NIL~
%ααααα$
.BOXB
.END
However this encoding scheme is not sufficient as it stands. Consider the
following example: Given internal pointers %3x%* and %3z%* into
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 13,13;
.BOXA
x z
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ X ~ #αβαα→~ Y ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
and assuming we wish to perform %3rplacd[x;(A#B#C)]%*
in our standard implementation we would arrive at
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.indent 8,8
.BOXA
x z
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ F ~ # ~ %αα→~ X ~ #αβαα→~ Y ~≤'~
%ααα∀αβα$ %ααα∀ααα$ %ααα∀αα$
↓
~ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
But if we assume %3(F X Y)%* is represented in its compact form,
we have some troubles.
We can't replace the cell
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
⊂ααααα⊃ ⊂αααααα⊃
~↓ X ~ by ~* #αβ→ to ∞3(A B C)∞*
%ααααα$ %αααααα$
.BOXB
.END
.FP
since %3z%* would get justifiably upset. What we will do is use the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3x%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
x ⊂ααααααααα⊃ ⊂αααααααα⊃ ⊂αααααααα⊃
%→αα→~i #αβαααα→~↓ F ~ ⊂→αα→~↓ A ~
εαααααααααλ εααααααααλ ↑ εααααααααλ
z →α→~↓ X ~ ~* #αβα$ ~↓ B ~
εαααααααααλ %αααααααα$ εααααααααλ
~/ Y ~ ~/ C ~
%ααααααααα$ %αααααααα$
.BOXB
.END
.FP
These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
proposed by R. Greenblatt in his LISP machine; he has also proposed
implementing %3cdr%*-coding in hardware.
Between invisible pointers and
%3cdr%*-coding, we %6can%* effect all LISP operations using this
potentially more compact representation.
We must be able to maintain that compact representation while the program
is running. This requires more care in the management of storage.
We cannot simply garbage collect and fragment space; we cannot
use the simple compacting garbage collector
discussed in {yonss(P291)} since it does not
attempt to maintain the compact representation.
Several algorithms
with the desired properties exist (⊗↑[Che#70]↑, ⊗↑[Cla#76]↑).
One feature of this data representation is its use of variable-sized
linked blocks of sequential storage. The management of these storage
blocks is more complex than that of simple dotted#pairs, but the
additional overhead may be acceptable if it gives better locality of reference
and faster access to list elements.⊗↓Notice that the %3cdr%1-coded representations
of %3(A#B)%1 and %3(A#.#B)%1 are equally expensive. In the typical linked-list
representation, %3(A#B)%1 requires more space than %3(A#.#B)%1.←
There is less conflict about the use of more complex storage
management techniques in the area of LISP's dynamic implementation.
The original versions of LISP#1.5 used dotted pair structure to represent
the access environments.⊗↓The control information did use a stack implementation
coded in machine language.← This generality allowed a correct solution to the
implementation of %3function%1, but experience with LISP implementations
has shown that it is quite expensive to maintain this generality
when most applications are of a less general nature.
Implementation techniques, patterned after our Weizenbaum
diagrams, allow some economies without loss of generality.
Again, storage would be allocated in sequential blocks; each block would
be of a size sufficient to hold the representation of the name-value entries
along with the additional areas to link the block to the environment.
The storage blocks need not be allocated sequentially; indeed, in the
general case blocks cannot be allocated sequentially. The de-allocation
problems are somewhat different from those experienced
by data structure representations.
The environment structures are much more "well#behaved" than general list-structures.
Therefore an "environment garbage collector" may not be needed.
The most general techniques for management of LISP's dynamic
environment are based on ⊗↑[Bob#73a]↑ and succeeding papers.⊗↓There is something
contradictory about LISP implementors' attitudes toward storage
and dynamics. Much effort is expended in attempting to
minimize the overhead involved in the dynamic operation of LISP;
it is frequently stated that users should not be penalized
for access/control constructs which they do not use. However, that
attitude is not extended to LISP's data structures. There are very
generous subsets of LISP applications in which the data structure operations
are suitably well-behaved, so that storage reclamation techniques
less general than garbage collection are applicable. Analysis of this
area of LISP should lead to profitable results.←
At a lower level of implementation, LISP has much to say about machine
organization. The implementation of efficient environment-swapping
algorithms is a problem which any operating system must face.
The traditional solutions impose severe restrictions on interprocess
communications. The algorithms developed for LISP show promise
for giving efficient implementations of more general scope.
LISP's organization of memory also has lessons for machine architecture. The
management of large variable-sized memory spaces like ⊗↑[Ste#73]↑ or ⊗↑[Wegb#70]↑
can be supported in hardware. The allocation and de-allocation of such large
spaces also require care; LISP implementors have begun to address these problems
(⊗↑[Ste#76a]↑, ⊗↑[Bis#74a]↑).
Finally, the highly interactive nature of modern LISP programming
systems gives direction to efforts developing more "reactive"
programming systems (⊗↑[Win#75]↑).
.SS(Hash Techniques,,P248:)
.FP
One very significant problem experienced by a LISP programmer is the
sheer size of data structures which a large LISP program generates.
Many LISP projects approach 10%87%1 bits of program and data.
Several techniques have been developed to help shrink data representation;
%3cdr%1-coding ({yonss(P224)}) is one technique. Another technique stems
from the observation that LISP tends to %6copy%1 structures rather
than %6share%1 them. We know that the sharing of structures must be done
with great care if modification operations like %3rplaca%1 and %3rplacd%1
are present, but sharing of structure can mean a significant saving in space.
In fact, the saving can also be reflected in the algorithms which manipulate
the structures. For example if every list#structure is stored uniquely,
then the time for the equality test %3equal%1 is a constant rather than being
proportional to the depth of the structure.
There are two alternatives for maintaining unique structure:
either maintain list space such that unique representations are always
present or supply an algorithm which will "uniquize" structures upon
request. The first alternative is usually called %2⊗→hash consing↔←%1; the
second technique is called %2list condensation%1#(⊗↑[Lin#73]↑).
A condensation algorithm must remove all duplicated structure from within
a list. Since condensation is a component of many hashed LISP implementations,
we will concentrate our attention on hash consing.
Hash consing is an extension of the LISP technique for generating
unique atoms. Since
list structure is created only by the %3cons%1 operation,⊗↓However, list structure
may be modified by %3rplaca%1 and %3rplacd%1.←
we place the responsibility for maintaining unique structure on %3cons%1.
If the result of a pending %3cons%1 is already present, we
return a pointer to that structure, otherwise we perform the %3cons%1
and record the result so that it will be retrieved if the same
%3cons%1 happens again. The adjective "hash" is applied to this version of
%3cons%1 since the typical implementation uses a hashing algorithm to
maintain the uniqueness.
Hash %3cons%1ing imposes restrictions on the
programmer's use of list modification operations. If unique copies
are available, severe difficulties result if modifications are made.
One either may disallow list modification or may supply additional operations
to copy structure, modify it, and "uniquize" the result. Or an implementation may
supply different kinds of structures, some modifiable and some not modifiable.
A hash %3cons%1 was proposed for LISP 1.75 on the IBM#M44, but the
implementation was never
completed. A limited version of hash %3cons%1ing was implemented as
an extension of LISP 1.6 at Stanford.
The most impressive and extensive applications of hashing appear
in HLISP (⊗↑[Got#74]↑, ⊗↑[Ter#75]↑). That implementation of LISP
supplies two different kinds of structures: ordinary list structure
and "monocopy" structures. Operations are also supplied for
conversion between types. Extensive analysis of hashing effects
on algorithm performance has also been done (⊗↑[Got#76]↑).
HLISP also employs hashing in its implementation of property lists.
Property lists are not stored explicitly, but rather the atom name and the
property name are used to form a hash index; the property value is associated
with that hash index.
For example,
%3get[x;i]%1 hashes with both %3x%1 and %3i%1 to retrieve the property value.
The other major implementations of LISP also
offer specialized operations for dealing with hashed quantities;
see ⊗↑[Moo#74]↑, ⊗↑[Int#75]↑, and ⊗↑[Bob#75]↑.